db <- read.csv("Data Export Summary.csv", row.names = 1)
head(db)
## X.ray NMR EM Multiple.methods Neutron Other Total
## Protein (only) 142303 11804 5999 177 70 32 160385
## Protein/Oligosaccharide 8414 31 979 5 0 0 9429
## Protein/NA 7491 274 1986 3 0 0 9754
## Nucleic acid (only) 2368 1372 60 8 2 1 3811
## Other 149 31 3 0 0 0 183
## Oligosaccharide (only) 11 6 0 1 0 4 22
Q1: What percentage of structures in the PDB are solved by X-Ray and Electron Microscopy.
method.sums <- colSums(db)
round((method.sums/method.sums["Total"]) * 100,2)
## X.ray NMR EM Multiple.methods
## 87.55 7.36 4.92 0.11
## Neutron Other Total
## 0.04 0.02 100.00
Q2: What proportion of structures in the PDB are protein?
round((db$Total/method.sums["Total"]) * 100, 2)
## [1] 87.36 5.14 5.31 2.08 0.10 0.01
The proportion is 87.36 >Q3: Type HIV in the PDB website search box on the home page and determine how many HIV-1 protease structures are in the current PDB?
There are 1828 protease structures
Q4: Water molecules normally have 3 atoms. Why do we see just one atom per water molecule in this structure?
The red spheres are being diplayed as the water molecule. >Q5: There is a conserved water molecule in the binding site. Can you identify this water molecule? What residue number does this water molecule have (see note below)?
Residue number is 135
library(bio3d)
pdb <- read.pdb("1hsg")
## Note: Accessing on-line PDB file
read.pdb(file = "1hsg")
## Note: Accessing on-line PDB file
## Warning in get.pdb(file, path = tempdir(), verbose = FALSE): /var/folders/91/
## yck4gs8x25xfgmj1cybpkktw0000gn/T//RtmpStLAfs/1hsg.pdb exists. Skipping download
##
## Call: read.pdb(file = "1hsg")
##
## Total Models#: 1
## Total Atoms#: 1686, XYZs#: 5058 Chains#: 2 (values: A B)
##
## Protein Atoms#: 1514 (residues/Calpha atoms#: 198)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 172 (residues: 128)
## Non-protein/nucleic resid values: [ HOH (127), MK1 (1) ]
##
## Protein sequence:
## PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
## QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNFPQITLWQRPLVTIKIGGQLKE
## ALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTP
## VNIIGRNLLTQIGCTLNF
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
print(pdb)
##
## Call: read.pdb(file = "1hsg")
##
## Total Models#: 1
## Total Atoms#: 1686, XYZs#: 5058 Chains#: 2 (values: A B)
##
## Protein Atoms#: 1514 (residues/Calpha atoms#: 198)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 172 (residues: 128)
## Non-protein/nucleic resid values: [ HOH (127), MK1 (1) ]
##
## Protein sequence:
## PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
## QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNFPQITLWQRPLVTIKIGGQLKE
## ALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTP
## VNIIGRNLLTQIGCTLNF
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
Q7: How many amino acid residues are there in this pdb object? 198
Q8: Name one of the two non-protein residues?
HOH >Q9: How many protein chains are in this structure?
2
attributes(pdb)
## $names
## [1] "atom" "xyz" "seqres" "helix" "sheet" "calpha" "remark" "call"
##
## $class
## [1] "pdb" "sse"
head(pdb$atom)
## type eleno elety alt resid chain resno insert x y z o b
## 1 ATOM 1 N <NA> PRO A 1 <NA> 29.361 39.686 5.862 1 38.10
## 2 ATOM 2 CA <NA> PRO A 1 <NA> 30.307 38.663 5.319 1 40.62
## 3 ATOM 3 C <NA> PRO A 1 <NA> 29.760 38.071 4.022 1 42.64
## 4 ATOM 4 O <NA> PRO A 1 <NA> 28.600 38.302 3.676 1 43.40
## 5 ATOM 5 CB <NA> PRO A 1 <NA> 30.508 37.541 6.342 1 37.87
## 6 ATOM 6 CG <NA> PRO A 1 <NA> 29.296 37.591 7.162 1 38.40
## segid elesy charge
## 1 <NA> N <NA>
## 2 <NA> C <NA>
## 3 <NA> C <NA>
## 4 <NA> O <NA>
## 5 <NA> C <NA>
## 6 <NA> C <NA>